Overview

Dataset statistics

Number of variables25
Number of observations3953
Missing cells1047
Missing cells (%)1.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.6 MiB
Average record size in memory966.5 B

Variable types

CAT14
NUM10
BOOL1

Reproduction

Analysis started2020-05-16 03:38:37.582264
Analysis finished2020-05-16 03:39:05.953347
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
name has a high cardinality: 3682 distinct values High cardinality
email_id has a high cardinality: 3373 distinct values High cardinality
dt_applied has a high cardinality: 3953 distinct values High cardinality
university has a high cardinality: 3140 distinct values High cardinality
zip_code has a high cardinality: 615 distinct values High cardinality
funded_amnt_inv is highly correlated with loan_amnt and 1 other fieldsHigh Correlation
loan_amnt is highly correlated with funded_amnt_inv and 1 other fieldsHigh Correlation
installment is highly correlated with loan_amnt and 1 other fieldsHigh Correlation
sub_grade is highly correlated with gradeHigh Correlation
grade is highly correlated with sub_gradeHigh Correlation
name has 271 (6.9%) missing values Missing
email_id has 580 (14.7%) missing values Missing
gender has 78 (2.0%) missing values Missing
university has 118 (3.0%) missing values Missing
delinq_2yrs has 3628 (91.8%) zeros Zeros
inq_last_6mths has 1822 (46.1%) zeros Zeros
revol_bal has 42 (1.1%) zeros Zeros

Variables

name
Categorical

HIGH CARDINALITY
MISSING
UNIFORM
Distinct count3682
Unique (%)100.0%
Missing271
Missing (%)6.9%
Memory size31.0 KiB
Torrey Avraham
 
1
Birk Shufflebotham
 
1
Chance Zannini
 
1
Ginevra Sowthcote
 
1
Vally De Bischop
 
1
Other values (3677)
3677
ValueCountFrequency (%) 
Torrey Avraham 1 < 0.1%
 
Birk Shufflebotham 1 < 0.1%
 
Chance Zannini 1 < 0.1%
 
Ginevra Sowthcote 1 < 0.1%
 
Vally De Bischop 1 < 0.1%
 
Jacynth Royson 1 < 0.1%
 
Lilian Lambell 1 < 0.1%
 
Hector Bates 1 < 0.1%
 
Gage Washtell 1 < 0.1%
 
Davide Rickword 1 < 0.1%
 
Other values (3672) 3672 92.9%
 
(Missing) 271 6.9%
 

Length

Max length23
Mean length13.27649886
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 26 44.8%
 
Lowercase_Letter 26 44.8%
 
Other_Punctuation 3 5.2%
 
Space_Separator 1 1.7%
 
Close_Punctuation 1 1.7%
 
Dash_Punctuation 1 1.7%
 
ValueCountFrequency (%) 
Latin 52 89.7%
 
Common 6 10.3%
 
ValueCountFrequency (%) 
ASCII 58 100.0%
 

email_id
Categorical

HIGH CARDINALITY
MISSING
UNIFORM
Distinct count3373
Unique (%)100.0%
Missing580
Missing (%)14.7%
Memory size31.0 KiB
jdeck@com.com
 
1
gswyre2x@hp.com
 
1
bdewingla@rambler.ru
 
1
fbessellfa@goo.gl
 
1
mtearny9i@google.ca
 
1
Other values (3368)
3368
ValueCountFrequency (%) 
jdeck@com.com 1 < 0.1%
 
gswyre2x@hp.com 1 < 0.1%
 
bdewingla@rambler.ru 1 < 0.1%
 
fbessellfa@goo.gl 1 < 0.1%
 
mtearny9i@google.ca 1 < 0.1%
 
oespinheiramc@shop-pro.jp 1 < 0.1%
 
jpellingtondz@phpbb.com 1 < 0.1%
 
tstoodersl8@zdnet.com 1 < 0.1%
 
blysth1@ycombinator.com 1 < 0.1%
 
rbirrelh6@ustream.tv 1 < 0.1%
 
Other values (3363) 3363 85.1%
 
(Missing) 580 14.7%
 

Length

Max length35
Mean length19.06982039
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 26 66.7%
 
Decimal_Number 10 25.6%
 
Other_Punctuation 2 5.1%
 
Dash_Punctuation 1 2.6%
 
ValueCountFrequency (%) 
Latin 26 66.7%
 
Common 13 33.3%
 
ValueCountFrequency (%) 
ASCII 39 100.0%
 

gender
Categorical

MISSING
Distinct count2
Unique (%)0.1%
Missing78
Missing (%)2.0%
Memory size31.0 KiB
Male
1970
Female
1905
ValueCountFrequency (%) 
Male 1970 49.8%
 
Female 1905 48.2%
 
(Missing) 78 2.0%
 

Length

Max length6
Mean length4.944093094
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 5 71.4%
 
Uppercase_Letter 2 28.6%
 
ValueCountFrequency (%) 
Latin 7 100.0%
 
ValueCountFrequency (%) 
ASCII 7 100.0%
 

dt_applied
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE
Distinct count3953
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size31.0 KiB
07/11/86
 
1
29/02/84
 
1
02/02/88
 
1
29/04/89
 
1
05/08/82
 
1
Other values (3948)
3948
ValueCountFrequency (%) 
07/11/86 1 < 0.1%
 
29/02/84 1 < 0.1%
 
02/02/88 1 < 0.1%
 
29/04/89 1 < 0.1%
 
05/08/82 1 < 0.1%
 
30/01/83 1 < 0.1%
 
22/01/88 1 < 0.1%
 
05/07/91 1 < 0.1%
 
27/06/89 1 < 0.1%
 
14/02/90 1 < 0.1%
 
Other values (3943) 3943 99.7%
 

Length

Max length8
Mean length8
Min length8
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Other_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Common 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

university
Categorical

HIGH CARDINALITY
MISSING
UNIFORM
Distinct count3140
Unique (%)81.9%
Missing118
Missing (%)3.0%
Memory size31.0 KiB
Carlow College
 
4
Abant Izzet Baysal University
 
4
Universidad de Congreso
 
4
Universidad Tecnológica de México
 
4
Phillips Graduate Institute
 
4
Other values (3135)
3815
ValueCountFrequency (%) 
Carlow College 4 0.1%
 
Abant Izzet Baysal University 4 0.1%
 
Universidad de Congreso 4 0.1%
 
Universidad Tecnológica de México 4 0.1%
 
Phillips Graduate Institute 4 0.1%
 
Arab Open University 4 0.1%
 
Stavropol State Technical University 4 0.1%
 
Universidad Valle del Momboy 4 0.1%
 
Jiangxi University of Traditional Chinese Medicine 4 0.1%
 
Fukuoka Institute of Technology 4 0.1%
 
Other values (3130) 3795 96.0%
 
(Missing) 118 3.0%
 

Length

Max length114
Mean length29.67088287
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 47 48.0%
 
Uppercase_Letter 29 29.6%
 
Decimal_Number 8 8.2%
 
Other_Punctuation 6 6.1%
 
Control 2 2.0%
 
Space_Separator 1 1.0%
 
Open_Punctuation 1 1.0%
 
Close_Punctuation 1 1.0%
 
Dash_Punctuation 1 1.0%
 
Final_Punctuation 1 1.0%
 
ValueCountFrequency (%) 
Latin 76 77.6%
 
Common 22 22.4%
 
ValueCountFrequency (%) 
ASCII 70 97.2%
 
Punctuation 2 2.8%
 

loan_amnt
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count434
Unique (%)11.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13017.499367568935
Minimum1000
Maximum35000
Zeros0
Zeros (%)0.0%
Memory size31.0 KiB

Quantile statistics

Minimum1000
5-th percentile3000
Q16500
median12000
Q317625
95-th percentile30000
Maximum35000
Range34000
Interquartile range (IQR)11125

Descriptive statistics

Standard deviation8155.330342
Coefficient of variation (CV)0.6264897821
Kurtosis0.3258532123
Mean13017.49937
Median Absolute Deviation (MAD)6481.432333
Skewness0.9233128761
Sum51458175
Variance66509412.98
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1000. 1050. 1975. 2050. 2350. ... 28050. 29925. 30200. 34737.5 35000. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
12000 315 8.0%
 
10000 259 6.6%
 
15000 190 4.8%
 
20000 174 4.4%
 
6000 165 4.2%
 
5000 153 3.9%
 
35000 143 3.6%
 
8000 124 3.1%
 
16000 99 2.5%
 
25000 97 2.5%
 
Other values (424) 2234 56.5%
 
ValueCountFrequency (%) 
1000 21 0.5%
 
1100 1 < 0.1%
 
1200 9 0.2%
 
1300 2 0.1%
 
1325 1 < 0.1%
 
ValueCountFrequency (%) 
35000 143 3.6%
 
34475 1 < 0.1%
 
34000 2 0.1%
 
33950 1 < 0.1%
 
33600 2 0.1%
 

funded_amnt_inv
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count828
Unique (%)20.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12809.792160966355
Minimum750.0
Maximum35000.0
Zeros0
Zeros (%)0.0%
Memory size31.0 KiB

Quantile statistics

Minimum750
5-th percentile3000
Q16500
median11775
Q317000
95-th percentile29735
Maximum35000
Range34250
Interquartile range (IQR)10500

Descriptive statistics

Standard deviation7935.907682
Coefficient of variation (CV)0.619518848
Kurtosis0.3951370723
Mean12809.79216
Median Absolute Deviation (MAD)6291.457434
Skewness0.9263171893
Sum50637108.41
Variance62978630.74
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 750. 1975. 2025. 2350. 2412.5 ... 34948.00626 34972.090825 34975.40818 34998.676225 35000. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
12000 249 6.3%
 
10000 222 5.6%
 
6000 153 3.9%
 
5000 143 3.6%
 
15000 139 3.5%
 
8000 113 2.9%
 
7000 87 2.2%
 
3000 74 1.9%
 
20000 72 1.8%
 
14000 64 1.6%
 
Other values (818) 2637 66.7%
 
ValueCountFrequency (%) 
750 1 < 0.1%
 
1000 20 0.5%
 
1100 1 < 0.1%
 
1200 9 0.2%
 
1300 2 0.1%
 
ValueCountFrequency (%) 
35000 37 0.9%
 
34997.35245 1 < 0.1%
 
34993.65539 1 < 0.1%
 
34987.98452 1 < 0.1%
 
34987.27101 1 < 0.1%
 

term
Categorical

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size31.0 KiB
36 months
2687
60 months
1266
ValueCountFrequency (%) 
36 months 2687 68.0%
 
60 months 1266 32.0%
 

Length

Max length10
Mean length10
Min length10
ValueCountFrequency (%) 
Lowercase_Letter 6 60.0%
 
Decimal_Number 3 30.0%
 
Space_Separator 1 10.0%
 
ValueCountFrequency (%) 
Latin 6 60.0%
 
Common 4 40.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 

int_rate
Real number (ℝ≥0)

Distinct count35
Unique (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1296908676954212
Minimum0.06
Maximum0.24100000000000002
Zeros0
Zeros (%)0.0%
Memory size31.0 KiB

Quantile statistics

Minimum0.06
5-th percentile0.066
Q10.099
median0.127
Q30.16
95-th percentile0.203
Maximum0.241
Range0.181
Interquartile range (IQR)0.061

Descriptive statistics

Standard deviation0.04160931484
Coefficient of variation (CV)0.3208345782
Kurtosis-0.6951924625
Mean0.1296908677
Median Absolute Deviation (MAD)0.03406419113
Skewness0.226416223
Sum512.668
Variance0.001731335081
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.06 0.063 0.077 0.084 0.112 ... 0.1745 0.192 0.206 0.2255 0.241 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.117 324 8.2%
 
0.127 259 6.6%
 
0.079 259 6.6%
 
0.124 254 6.4%
 
0.135 231 5.8%
 
0.143 226 5.7%
 
0.107 213 5.4%
 
0.099 211 5.3%
 
0.089 198 5.0%
 
0.06 160 4.0%
 
Other values (25) 1618 40.9%
 
ValueCountFrequency (%) 
0.06 160 4.0%
 
0.066 156 3.9%
 
0.075 137 3.5%
 
0.079 259 6.6%
 
0.089 198 5.0%
 
ValueCountFrequency (%) 
0.241 2 0.1%
 
0.239 6 0.2%
 
0.235 6 0.2%
 
0.231 4 0.1%
 
0.227 6 0.2%
 

installment
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count1923
Unique (%)48.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean375.2073362003542
Minimum32.23
Maximum1283.5
Zeros0
Zeros (%)0.0%
Memory size31.0 KiB

Quantile statistics

Minimum32.23
5-th percentile93.88
Q1205.86
median336
Q3494.59
95-th percentile813.626
Maximum1283.5
Range1251.27
Interquartile range (IQR)288.73

Descriptive statistics

Standard deviation220.261152
Coefficient of variation (CV)0.5870385006
Kurtosis0.8900854243
Mean375.2073362
Median Absolute Deviation (MAD)171.9058561
Skewness0.9837168213
Sum1483194.6
Variance48514.9751
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 32.23 76.085 152.155 152.285 153.485 ... 821.125 983.32 1091.75 1113.46 1283.5 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
330.76 27 0.7%
 
396.92 25 0.6%
 
325.74 22 0.6%
 
386.7 21 0.5%
 
339.31 20 0.5%
 
322.25 19 0.5%
 
334.16 19 0.5%
 
343.09 18 0.5%
 
190.52 18 0.5%
 
368.45 17 0.4%
 
Other values (1913) 3747 94.8%
 
ValueCountFrequency (%) 
32.23 1 < 0.1%
 
32.58 2 0.1%
 
33.08 2 0.1%
 
33.55 1 < 0.1%
 
33.94 3 0.1%
 
ValueCountFrequency (%) 
1283.5 1 < 0.1%
 
1276.6 1 < 0.1%
 
1269.73 1 < 0.1%
 
1243.85 1 < 0.1%
 
1222.03 1 < 0.1%
 

grade
Categorical

HIGH CORRELATION
Distinct count7
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size31.0 KiB
B
1262
A
908
C
811
D
510
E
313
Other values (2)
 
149
ValueCountFrequency (%) 
B 1262 31.9%
 
A 908 23.0%
 
C 811 20.5%
 
D 510 12.9%
 
E 313 7.9%
 
F 125 3.2%
 
G 24 0.6%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 7 100.0%
 
ValueCountFrequency (%) 
Latin 7 100.0%
 
ValueCountFrequency (%) 
ASCII 7 100.0%
 

sub_grade
Categorical

HIGH CORRELATION
Distinct count35
Unique (%)0.9%
Missing0
Missing (%)0.0%
Memory size31.0 KiB
B3
 
324
B5
 
260
A4
 
259
B4
 
254
C1
 
231
Other values (30)
2625
ValueCountFrequency (%) 
B3 324 8.2%
 
B5 260 6.6%
 
A4 259 6.6%
 
B4 254 6.4%
 
C1 231 5.8%
 
C2 227 5.7%
 
B2 213 5.4%
 
B1 211 5.3%
 
A5 198 5.0%
 
A1 158 4.0%
 
Other values (25) 1618 40.9%
 

Length

Max length2
Mean length2
Min length2
ValueCountFrequency (%) 
Uppercase_Letter 7 58.3%
 
Decimal_Number 5 41.7%
 
ValueCountFrequency (%) 
Latin 7 58.3%
 
Common 5 41.7%
 
ValueCountFrequency (%) 
ASCII 12 100.0%
 

home_ownership
Categorical

Distinct count3
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size31.0 KiB
RENT
2081
MORTGAGE
1577
OWN
 
295
ValueCountFrequency (%) 
RENT 2081 52.6%
 
MORTGAGE 1577 39.9%
 
OWN 295 7.5%
 

Length

Max length8
Mean length5.521123198
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 9 100.0%
 
ValueCountFrequency (%) 
Latin 9 100.0%
 
ValueCountFrequency (%) 
ASCII 9 100.0%
 

annual_inc
Real number (ℝ≥0)

Distinct count813
Unique (%)20.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean66175.9735365545
Minimum8280.0
Maximum550000.0
Zeros0
Zeros (%)0.0%
Memory size31.0 KiB

Quantile statistics

Minimum8280
5-th percentile25000
Q140100
median57000
Q380000
95-th percentile135880
Maximum550000
Range541720
Interquartile range (IQR)39900

Descriptive statistics

Standard deviation40498.80417
Coefficient of variation (CV)0.6119865264
Kurtosis18.71426089
Mean66175.97354
Median Absolute Deviation (MAD)27257.62778
Skewness3.058200935
Sum261593623.4
Variance1640153139
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 8280. 11910. 12126. 19100. 19220. ... 180198. 197500. 202000. 312500. 550000.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
60000 154 3.9%
 
50000 149 3.8%
 
75000 120 3.0%
 
40000 120 3.0%
 
45000 114 2.9%
 
70000 96 2.4%
 
80000 93 2.4%
 
30000 93 2.4%
 
65000 88 2.2%
 
35000 82 2.1%
 
Other values (803) 2844 71.9%
 
ValueCountFrequency (%) 
8280 1 < 0.1%
 
8400 1 < 0.1%
 
9600 1 < 0.1%
 
9960 1 < 0.1%
 
10000 1 < 0.1%
 
ValueCountFrequency (%) 
550000 1 < 0.1%
 
525000 1 < 0.1%
 
408000 1 < 0.1%
 
400000 2 0.1%
 
365000 1 < 0.1%
 
Distinct count3
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size31.0 KiB
Verified
1515
Not Verified
1247
Source Verified
1191
ValueCountFrequency (%) 
Verified 1515 38.3%
 
Not Verified 1247 31.5%
 
Source Verified 1191 30.1%
 

Length

Max length15
Mean length11.37085758
Min length8
ValueCountFrequency (%) 
Lowercase_Letter 9 69.2%
 
Uppercase_Letter 3 23.1%
 
Space_Separator 1 7.7%
 
ValueCountFrequency (%) 
Latin 12 92.3%
 
Common 1 7.7%
 
ValueCountFrequency (%) 
ASCII 13 100.0%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size31.0 KiB
0
3275
1
678
ValueCountFrequency (%) 
0 3275 82.8%
 
1 678 17.2%
 

purpose
Categorical

Distinct count13
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size31.0 KiB
debt_consolidation
2102
credit_card
792
other
 
297
home_improvement
 
196
small_business
 
145
Other values (8)
421
ValueCountFrequency (%) 
debt_consolidation 2102 53.2%
 
credit_card 792 20.0%
 
other 297 7.5%
 
home_improvement 196 5.0%
 
small_business 145 3.7%
 
major_purchase 100 2.5%
 
car 90 2.3%
 
wedding 63 1.6%
 
medical 52 1.3%
 
moving 39 1.0%
 
Other values (3) 77 1.9%
 

Length

Max length18
Mean length14.28307614
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 21 95.5%
 
Connector_Punctuation 1 4.5%
 
ValueCountFrequency (%) 
Latin 21 95.5%
 
Common 1 4.5%
 
ValueCountFrequency (%) 
ASCII 22 100.0%
 

zip_code
Categorical

HIGH CARDINALITY
Distinct count615
Unique (%)15.6%
Missing0
Missing (%)0.0%
Memory size31.0 KiB
606xx
 
55
900xx
 
55
100xx
 
54
112xx
 
50
945xx
 
49
Other values (610)
3690
ValueCountFrequency (%) 
606xx 55 1.4%
 
900xx 55 1.4%
 
100xx 54 1.4%
 
112xx 50 1.3%
 
945xx 49 1.2%
 
070xx 45 1.1%
 
331xx 44 1.1%
 
750xx 41 1.0%
 
300xx 41 1.0%
 
113xx 40 1.0%
 
Other values (605) 3479 88.0%
 

Length

Max length5
Mean length5
Min length5
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Lowercase_Letter 1 9.1%
 
ValueCountFrequency (%) 
Common 10 90.9%
 
Latin 1 9.1%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

add_state
Categorical

Distinct count43
Unique (%)1.1%
Missing0
Missing (%)0.0%
Memory size31.0 KiB
CA
729
NY
 
372
FL
 
304
TX
 
273
NJ
 
181
Other values (38)
2094
ValueCountFrequency (%) 
CA 729 18.4%
 
NY 372 9.4%
 
FL 304 7.7%
 
TX 273 6.9%
 
NJ 181 4.6%
 
IL 155 3.9%
 
GA 146 3.7%
 
PA 136 3.4%
 
VA 130 3.3%
 
OH 124 3.1%
 
Other values (33) 1403 35.5%
 

Length

Max length2
Mean length2
Min length2
ValueCountFrequency (%) 
Uppercase_Letter 24 100.0%
 
ValueCountFrequency (%) 
Latin 24 100.0%
 
ValueCountFrequency (%) 
ASCII 24 100.0%
 

dti
Real number (ℝ≥0)

Distinct count1961
Unique (%)49.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.428287376675943
Minimum0.0
Maximum29.85
Zeros3
Zeros (%)0.1%
Memory size31.0 KiB

Quantile statistics

Minimum0
5-th percentile3.932
Q19.58
median14.45
Q319.47
95-th percentile24.214
Maximum29.85
Range29.85
Interquartile range (IQR)9.89

Descriptive statistics

Standard deviation6.378445753
Coefficient of variation (CV)0.4420792008
Kurtosis-0.7703420751
Mean14.42828738
Median Absolute Deviation (MAD)5.34898955
Skewness-0.04903565752
Sum57035.02
Variance40.68457022
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 3.73 6.095 8.675 20.055 23.945 24.985 29.85 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
11.8 9 0.2%
 
18.63 8 0.2%
 
20.88 8 0.2%
 
9.65 7 0.2%
 
12.48 7 0.2%
 
18.84 7 0.2%
 
17.67 7 0.2%
 
16.4 7 0.2%
 
19.63 7 0.2%
 
16.2 7 0.2%
 
Other values (1951) 3879 98.1%
 
ValueCountFrequency (%) 
0 3 0.1%
 
0.02 2 0.1%
 
0.07 1 < 0.1%
 
0.2 1 < 0.1%
 
0.25 1 < 0.1%
 
ValueCountFrequency (%) 
29.85 1 < 0.1%
 
29.83 1 < 0.1%
 
29.73 1 < 0.1%
 
29.72 1 < 0.1%
 
29.63 1 < 0.1%
 

delinq_2yrs
Real number (ℝ≥0)

ZEROS
Distinct count6
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.10852517075638755
Minimum0
Maximum6
Zeros3628
Zeros (%)91.8%
Memory size31.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.4087983222
Coefficient of variation (CV)3.766852606
Kurtosis32.99870086
Mean0.1085251708
Median Absolute Deviation (MAD)0.1992053223
Skewness4.954297207
Sum429
Variance0.1671160683
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.5 1.5 2.5 3.5 6. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3628 91.8%
 
1 246 6.2%
 
2 61 1.5%
 
3 13 0.3%
 
4 4 0.1%
 
6 1 < 0.1%
 
ValueCountFrequency (%) 
0 3628 91.8%
 
1 246 6.2%
 
2 61 1.5%
 
3 13 0.3%
 
4 4 0.1%
 
ValueCountFrequency (%) 
6 1 < 0.1%
 
4 4 0.1%
 
3 13 0.3%
 
2 61 1.5%
 
1 246 6.2%
 

inq_last_6mths
Real number (ℝ≥0)

ZEROS
Distinct count9
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8555527447508222
Minimum0
Maximum8
Zeros1822
Zeros (%)46.1%
Memory size31.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.997025005
Coefficient of variation (CV)1.165357731
Kurtosis2.163689287
Mean0.8555527448
Median Absolute Deviation (MAD)0.7886754874
Skewness1.26526022
Sum3382
Variance0.9940588606
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.5 1.5 2.5 3.5 5.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 1822 46.1%
 
1 1245 31.5%
 
2 584 14.8%
 
3 265 6.7%
 
4 21 0.5%
 
5 10 0.3%
 
6 3 0.1%
 
7 2 0.1%
 
8 1 < 0.1%
 
ValueCountFrequency (%) 
0 1822 46.1%
 
1 1245 31.5%
 
2 584 14.8%
 
3 265 6.7%
 
4 21 0.5%
 
ValueCountFrequency (%) 
8 1 < 0.1%
 
7 2 0.1%
 
6 3 0.1%
 
5 10 0.3%
 
4 21 0.5%
 

pub_rec
Categorical

Distinct count3
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size31.0 KiB
0
3831
1
 
120
2
 
2
ValueCountFrequency (%) 
0 3831 96.9%
 
1 120 3.0%
 
2 2 0.1%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 3 100.0%
 
ValueCountFrequency (%) 
Common 3 100.0%
 
ValueCountFrequency (%) 
ASCII 3 100.0%
 

revol_bal
Real number (ℝ≥0)

ZEROS
Distinct count3672
Unique (%)92.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14367.447508221603
Minimum0
Maximum140967
Zeros42
Zeros (%)1.1%
Memory size31.0 KiB

Quantile statistics

Minimum0
5-th percentile1240.4
Q16352
median11449
Q318151
95-th percentile35148.4
Maximum140967
Range140967
Interquartile range (IQR)11799

Descriptive statistics

Standard deviation13468.63453
Coefficient of variation (CV)0.937441012
Kurtosis18.01764983
Mean14367.44751
Median Absolute Deviation (MAD)8653.89346
Skewness3.322035836
Sum56794520
Variance181404116.1
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000e+00 1.50000e+00 5.52300e+03 1.21895e+04 1.89260e+04 ... 2.97645e+04 3.37135e+04 4.79720e+04 7.20315e+04 1.40967e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 42 1.1%
 
8032 3 0.1%
 
6314 3 0.1%
 
14848 3 0.1%
 
10980 3 0.1%
 
11338 3 0.1%
 
15183 3 0.1%
 
8357 3 0.1%
 
6565 3 0.1%
 
13034 3 0.1%
 
Other values (3662) 3884 98.3%
 
ValueCountFrequency (%) 
0 42 1.1%
 
3 1 < 0.1%
 
6 1 < 0.1%
 
8 1 < 0.1%
 
16 1 < 0.1%
 
ValueCountFrequency (%) 
140967 1 < 0.1%
 
131949 1 < 0.1%
 
130920 1 < 0.1%
 
124744 1 < 0.1%
 
123416 1 < 0.1%
 

total_paymnt
Real number (ℝ≥0)

Distinct count3710
Unique (%)93.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14435.064318165443
Minimum0.0
Maximum58886.47343
Zeros2
Zeros (%)0.1%
Memory size31.0 KiB

Quantile statistics

Minimum0
5-th percentile2401.064047
Q16614.78722
median11907.35
Q319190.68001
95-th percentile35788.92425
Maximum58886.47343
Range58886.47343
Interquartile range (IQR)12575.89279

Descriptive statistics

Standard deviation10492.53033
Coefficient of variation (CV)0.7268779753
Kurtosis1.593830926
Mean14435.06432
Median Absolute Deviation (MAD)8121.314833
Skewness1.261678967
Sum57061809.25
Variance110093192.6
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 910.17 2699.89 3480.19083 3480.2699995 ... 40009.00912 40009.672485 47160.0108 47160.13944 58886.47343 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
14288.76169 8 0.2%
 
13148.13786 7 0.2%
 
11907.34732 7 0.2%
 
12029.45 7 0.2%
 
11600.98 6 0.2%
 
14288.77 5 0.1%
 
11726.32 5 0.1%
 
10956.77596 5 0.1%
 
9011.557494 5 0.1%
 
13263.96 5 0.1%
 
Other values (3700) 3893 98.5%
 
ValueCountFrequency (%) 
0 2 0.1%
 
91.39 1 < 0.1%
 
151.8 1 < 0.1%
 
165.37 1 < 0.1%
 
203.55 1 < 0.1%
 
ValueCountFrequency (%) 
58886.47343 1 < 0.1%
 
58133.3199 1 < 0.1%
 
58090.95207 1 < 0.1%
 
58071.19982 1 < 0.1%
 
58071.19977 1 < 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

nameemail_idgenderdt_applieduniversityloan_amntfunded_amnt_invtermint_rateinstallmentgradesub_gradehome_ownershipannual_incverification_statusloan_writeoffpurposezip_codeadd_statedtidelinq_2yrsinq_last_6mthspub_recrevol_baltotal_paymnt
0Calley Gironcgiron0@ehow.comFemale01/01/81Warner Southern College50004975.036 months0.107162.87BB2RENT24000.0Verified0credit_card860xxAZ27.65010136485863.155187
1Linus Studlstud1@washington.eduMale02/01/81Shri Lal Bahadur Shastri Rashtriya Sanskrit Vidyapeetha25002500.060 months0.15359.83CC4RENT30000.0Source Verified1car309xxGA1.0005016871014.530000
2Lorelle Ambagelambage2@wix.comFemale03/01/81Technische Universität Bergakademie Freiberg24002400.036 months0.16084.33CC5RENT12252.0Not Verified0small_business606xxIL8.7202029563005.666844
3Anna-diane Larratalarrat3@economist.comFemale04/01/81Divine Word College of Legazpi1000010000.036 months0.135339.31CC1RENT49200.0Source Verified0other917xxCA20.00010559812231.890000
4Gill RuskeNaNFemale05/01/81East China Jiao Tong University30003000.060 months0.12767.79BB5RENT80000.0Source Verified0other972xxOR17.94000277834066.908161
5Evelyn MacFaulemacfaul5@theatlantic.comFemale06/01/81Ahmedabad University50005000.036 months0.079156.46AA4RENT36000.0Source Verified0wedding852xxAZ11.2003079635632.210000
6Ainslie Rainardarainard6@virginia.eduFemale07/01/81NaN70007000.060 months0.160170.08CC5RENT47004.0Not Verified0debt_consolidation280xxNC23.510101772610137.840010
7Emmott Hambyehamby7@prnewswire.comMale08/01/81Institute of Business Management30003000.036 months0.186109.43EE1RENT48000.0Source Verified0car900xxCA5.3502082213939.135294
8Shem Toomerstoomer8@home.plMale09/01/81Osaka University of Education56005600.060 months0.213152.39FF2OWN40000.0Source Verified1small_business958xxCA5.550205210647.500000
9Giana Aberhartgaberhart9@mozilla.comFemale10/01/81American Public University53755350.060 months0.127121.45BB5RENT15000.0Verified1other774xxTX18.0800092791484.590000

Last rows

nameemail_idgenderdt_applieduniversityloan_amntfunded_amnt_invtermint_rateinstallmentgradesub_gradehome_ownershipannual_incverification_statusloan_writeoffpurposezip_codeadd_statedtidelinq_2yrsinq_last_6mthspub_recrevol_baltotal_paymnt
3943Merla Thebemthebeq7@cocolog-nifty.comFemale21/10/91North Eastern Hill University60006000.036 months0.163211.81DD1RENT39564.0Verified1debt_consolidation606xxIL23.7821020283388.960000
3944Marcellina Dinnegesmdinnegesq8@infoseek.co.jpFemale22/10/91Universidade Católica de Santos24002400.036 months0.11779.39BB3RENT39800.0Not Verified0other303xxGA14.32000154972836.660516
3945Way Symondswsymondsq9@mlb.comMale23/10/91American International University West Africa2500025000.060 months0.183638.25DD5MORTGAGE156000.0Source Verified0house944xxCA5.850001070937936.750000
3946Ailene MatejkaNaNFemale24/10/91Kaya University2000020000.036 months0.117661.52BB3RENT80700.0Verified0debt_consolidation946xxCA13.67010721123406.523000
3947Samuel OverelNaNMale25/10/91Northwestern University1200012000.060 months0.183306.36DD5MORTGAGE34000.0Not Verified1debt_consolidation177xxPA12.5600061149667.950000
3948Corbie Creeboeccreeboeqc@sitemeter.comMale26/10/91Shaheed Rajaei Teacher Training University1200012000.036 months0.135407.17CC1RENT125000.0Source Verified0wedding086xxNJ13.180104628614657.917650
3949Bobbe Ochterloniebochterlonieqd@ezinearticles.comFemale27/10/91Dhofar University1500015000.036 months0.124501.23BB4RENT72000.0Verified0debt_consolidation104xxNY7.470101214716729.253640
3950Corella Espositocespositoqe@macromedia.comFemale28/10/91University of Jan Evangelista Purkyne1200012000.036 months0.060365.23AA1OWN48000.0Not Verified0debt_consolidation365xxAL23.350002238513148.137860
3951Prince Dibdinpdibdinqf@businessinsider.comMale29/10/91College in Sládkovičovo1500015000.060 months0.160364.46CC5RENT50000.0Verified1debt_consolidation907xxCA18.26010979910883.540000
3952Georgette Warrattgwarrattqg@java.comFemale30/10/91Technical University of Lublin1500014975.060 months0.153358.98CC4MORTGAGE32976.0Not Verified1debt_consolidation177xxPA17.90010795611704.260000